Resource Health

Overview

Argo CD provides built-in health assessment for several standard Kubernetes types, which is then surfaced to the overall Application health status as a whole. The following checks are made for specific types of kubernetes resources:

Deployment, ReplicaSet, StatefulSet, DaemonSet

  • Observed generation is equal to desired generation.
  • Number of updated replicas equals the number of desired replicas.

Service

  • If service type is of type LoadBalancer, the status.loadBalancer.ingress list is non-empty, with at least one value for hostname or IP.

Ingress

  • The status.loadBalancer.ingress list is non-empty, with at least one value for hostname or IP.

Job

  • If job .spec.suspended is set to ‘true’, then the job and app health will be marked as suspended.

PersistentVolumeClaim

  • The status.phase is Bound

Argocd App

The health assessment of argoproj.io/Application CRD has been removed in argocd 1.8 (see #3781 for more information). You might need to restore it if you are using app-of-apps pattern and orchestrating synchronization using sync waves. Add the following resource customization in argocd-cm ConfigMap:

  1. ---
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: argocd-cm
  6. namespace: argocd
  7. labels:
  8. app.kubernetes.io/name: argocd-cm
  9. app.kubernetes.io/part-of: argocd
  10. data:
  11. resource.customizations: |
  12. argoproj.io/Application:
  13. health.lua: |
  14. hs = {}
  15. hs.status = "Progressing"
  16. hs.message = ""
  17. if obj.status ~= nil then
  18. if obj.status.health ~= nil then
  19. hs.status = obj.status.health.status
  20. if obj.status.health.message ~= nil then
  21. hs.message = obj.status.health.message
  22. end
  23. end
  24. end
  25. return hs

Custom Health Checks

Argo CD supports custom health checks written in Lua. This is useful if you:

  • Are affected by known issues where your Ingress or StatefulSet resources are stuck in Progressing state because of bug in your resource controller.
  • Have a custom resource for which Argo CD does not have a built-in health check.

There are two ways to configure a custom health check. The next two sections describe those ways.

Way 1. Define a Custom Health Check in argocd-cm ConfigMap

Custom health checks can be defined in

  1. resource.customizations: |
  2. <group/kind>:
  3. health.lua: |

field of argocd-cm. If you are using argocd-operator, this is overridden by the argocd-operator resourceCustomizations.

The following example demonstrates a health check for cert-manager.io/Certificate.

  1. data:
  2. resource.customizations: |
  3. cert-manager.io/Certificate:
  4. health.lua: |
  5. hs = {}
  6. if obj.status ~= nil then
  7. if obj.status.conditions ~= nil then
  8. for i, condition in ipairs(obj.status.conditions) do
  9. if condition.type == "Ready" and condition.status == "False" then
  10. hs.status = "Degraded"
  11. hs.message = condition.message
  12. return hs
  13. end
  14. if condition.type == "Ready" and condition.status == "True" then
  15. hs.status = "Healthy"
  16. hs.message = condition.message
  17. return hs
  18. end
  19. end
  20. end
  21. end
  22. hs.status = "Progressing"
  23. hs.message = "Waiting for certificate"
  24. return hs

In order to prevent duplication of the custom health check for potentially multiple resources, it is also possible to specify a wildcard in the resource kind, and anywhere in the resource group, like this:

  1. resource.customizations: |
  2. ec2.aws.crossplane.io/*:
  3. health.lua: |
  4. ...
  1. resource.customizations: |
  2. "*.aws.crossplane.io/*":
  3. health.lua: |
  4. ...

Important

Please note the required quotes in the resource customization health section, if the wildcard starts with *.

The obj is a global variable which contains the resource. The script must return an object with status and optional message field. The custom health check might return one of the following health statuses:

  • Healthy - the resource is healthy
  • Progressing - the resource is not healthy yet but still making progress and might be healthy soon
  • Degraded - the resource is degraded
  • Suspended - the resource is suspended and waiting for some external event to resume (e.g. suspended CronJob or paused Deployment)

By default health typically returns Progressing status.

NOTE: As a security measure, access to the standard Lua libraries will be disabled by default. Admins can control access by setting resource.customizations.useOpenLibs.<group_kind>. In the following example, standard libraries are enabled for health check of cert-manager.io/Certificate.

  1. data:
  2. resource.customizations.useOpenLibs.cert-manager.io_Certificate: "true"
  3. resource.customizations.health.cert-manager.io_Certificate:
  4. -- Lua standard libraries are enabled for this script

Way 2. Contribute a Custom Health Check

A health check can be bundled into Argo CD. Custom health check scripts are located in the resource_customizations directory of https://github.com/argoproj/argo-cd. This must have the following directory structure:

  1. argo-cd
  2. |-- resource_customizations
  3. | |-- your.crd.group.io # CRD group
  4. | | |-- MyKind # Resource kind
  5. | | | |-- health.lua # Health check
  6. | | | |-- health_test.yaml # Test inputs and expected results
  7. | | | +-- testdata # Directory with test resource YAML definitions

Each health check must have tests defined in health_test.yaml file. The health_test.yaml is a YAML file with the following structure:

  1. tests:
  2. - healthStatus:
  3. status: ExpectedStatus
  4. message: Expected message
  5. inputPath: testdata/test-resource-definition.yaml

To test the implemented custom health checks, run go test -v ./util/lua/.

The PR#1139 is an example of Cert Manager CRDs custom health check.

Please note that bundled health checks with wildcards are not supported.